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Abstract 

The representation of a biochemical network as a graph is the coarsest 
level of description in cellular biochemistry. By studying the network 
Q\ • structure one can draw conclusions on the large scale organisation 

^ ■ of the biochemical processes. We describe methods how one can 

extract hierarchies of subnetworks, how these can be interpreted and 
q \ further deconstructed to find autonomous subnetworks. The large- 

scale organisation we find is characterised by a tightly connected 
core surrounded by increasingly loosely connected substrates. 
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1 Introduction 



At the coarsest level of description, cellular biochemistry can be repre- 
sented as a network of vertices (substrates) linked by chemical reactions. 
For both conceptual and analytical purposes, the vastness and complexity 
of these biochemical networks calls for a division into smaller subunits. 
This is nothing new — traditionally biochemists have talked about func- 
tional subnetworks, the citric acid cycle being one example, comprised of 
biochemical pathways. As modern day genomics gives an increasingly 
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comprehensive picture of the biochemical network one would like to com- 
plement the traditional way of mapping out subnetworks by objective 
graph theoretical methods. By such methods we can address not only the 
question what relevant subnetworks there are, but also the hierarchical or- 
ganisation of subnetworks (can subnetworks be said to consist of smaller 
subnetworks, and so on), and also more fundamental questions about in 
what context the subnetwork concept is relevant and when the biochemical 
circuitry is to be considered as a functional whole. 

The graph-theoretical signature for a subnetwork is that it is internally 
densely connected but has relatively fe w links to the rest of the graph. 
Other methods for detecting subnetwork d 11 * 8 * 6 -^have been based on local 
properties such as the number of reactions a substrate takes part in, or 
the similarity of the neighbourhood. Since non-local features can heavily 
affect network dynamics,^ one would prefer methods that take these into 
account. Here, we discuss global algorithms for subnetwork detection, in 
particular methods based on the betweenness centrality measure. 



2 Preliminaries 

2.1 Biochemical networks as bipartite graphs 

A bipartite graph 1 contains of two types of vertices and links that only go 
between vertices of different type. We represent the biochemical networks 
as directed bipartite graphs G = (S, R, L) where S is a set of vertices rep- 
resenting substrates, R is a set of vertices representing chemical reactions, 
and L is the set of directed links — ordered pairs of one vertex in S and 
one vertex in R. The links are such that if the substrates S\, • • • ,s n are 
involved in a reaction r £ R with products s\, ■ ■ ■ ,s' n > £ S, then we have 
(si, r), ■ ■ ■ , (s„, r) £ L and (r, s\), ••• ,(r, s' n >) £ L. The number of links leading 
to a vertex is called in-degree and denoted k[ n . 



2.2 Betweenness centrality 

Roughly speaking, the betweenness centrality^ Cg of a vertex v in an undi- 
rected graph is the number of shortest paths between pairs of vertices 
that passes v. For the purposes of this work we are interested in reaction 
vertices that are central for paths between metabolites or other molecules; 



'Or, to be precise, a two-mode representation of a bipartite graph. The formal definition 
of bipartiteness is just that a graph contains no odd circuits. 
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thus we restrict our definition of betweenness to the reaction vertices only. 
The precise definition then becomes: 

seS s'eS\{s] ss 

where o ss >(r) is the number of shortest paths between s and s' that passes 
through r, and o ss > is the total number of shortest paths between s and 
s'. Since all substrates needs to be present for a reaction to occur it is 
meaningful to rescale the betweenness by the in-degree: 

c fl (r) = C B {r)lh m {r) . (2) 

We call c B the effective betweenness of v. 



2.3 Girvan and Newman's algorithm 

The algorithm for tracing subnetworks we use is due to Girvan and New- 
man (GN)jZlbut in a form adapted to bipartite representations of biochem- 
ical networks as presented in Ref. |U The idea of the algorithm is based 
on the fact that vertices that lie between densely connected areas have 
high betweenness, and vice versa. Thus by successively removing reac- 
tion vertices with high degree one will see the network disintegrate into 
subnetworks of decreasing size. Furthermore, the smaller subnetworks 
remaining after many iterations will be perfectly contained subnetworks 
earlier in the execution of the algorithm, thus the method produces a full 
hierarchy of subnetworks. 

The precise definition of the algorithm is to repeat the following steps 
until no reaction vertices remain: 

1. Calculate the effective betweenness Cb{t) for all reaction vertices. 

2. Remove the reaction vertex with highest effective betweenness and 
all its in- and out-going links. 

3. Save information about the current state of the network. 

If many reaction vertices have the same Cb in stepEJ we remove all of them 
at once. A C-implementation of this algorithm along with test data sets 
can be found at www . tp . umu . se/f orskning/networks/meta/. 
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CoA ■ 

D-glucosamine 1 -phosphate ■ 
/V-acetyl-D-glucosamine 1 -phosphate ■ 
~S-acetyldihydrol|)oprgicl^ ■ 

dihydrolipoamide ■ 
deoxyguanosine ■ 
-deoxy-D-ribose 1 -phosphate ' 
deoxyadenosine ■ 
guanine 1 
guanosine 1 
a-D-ribose 1 -phosphate ■ 
a-D-nbose 1 -pyrophosphate ' 
adenine ' 
adenosine 1 
hypoxanthine 1 
inosine 1 
orthophosphate ■ 
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Figure 1: The hierarchical clustering tree for the metabolic network of 
T. pallidum. The inset shows substrate names for a blow-up of the tree 
(indicated by black). 



3 A case-study: T. pallidum 

To illustrate the output of the algorithm, and how it can be post-processed, 
we choose the metabolic network of T. pallidum — the pathological agent of 
syphilis — as obtained from the WIT database^ 1 . 



3.1 The large scale shape of the hierarchy trees 

The subnetwork hierarchy of T. pallidum's metabolic network is presented 
as a tree (a so-called dendrogram) in Fig. |3| The end-points at the base of 
the dendrogram represent the substrate vertices of the metabolic network. 
The vertical dimension represents the hierarchical level — if a horizontal 
line is drawn across the dendrogram, the vertices connected below the line 
belongs to the same cluster (connected subgraph) at that particular level of 
the hierarchical organisation. The further down the tree two vertices are 
connected, the more tightly connected are they in the biochemical network. 
If one substrate is to be converted to another that is separated from the first 
one high up in the in the dendrogram, then a long chain of reactions is 
needed. If, on the other hand, the two vertices are connected near the 
bottom of the dendrogram, then they probably are both present in one or 

"This is the same data as used in Refs. Bill 01 and thus slightly outdated, but it should 
work well for illustrating the method. 
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more reactions. 

The most striking feature of Fig. |3l and indeed of any of the 43 or- 
ganisms of Ref. 5J is that the network has one dominating cluster at most 
levels of the hierarchy. As the algorithm proceeds (one goes from top to 
bottom of the dendrogram) a few vertices at a time peel off from the largest 
connected cluster. The emerging picture is that the large scale structure of 
metabolic network has a tightly connected core and increasingly loosely 
connected outer 'shells.' A few rather well-defined sub-networks are iden- 
tified however, for example the subnetworks of Fig.[3]containing reactions 
associated with purine metabolism and pyruvate/acetyl-CoA conversion. 

3.2 Criteria for identifying subnetworks 

We can identify subnetworks by looking at the hierarchy tree, if a subnet- 
work is isolated at some level (like the N-acetyl-D-glucosamine 1-phos- 
phate, D-glucosamine 1-phosphate, dihydrolipoamide, S-acetyldihydro- 
lipoamide, CoA, and acetyl-CoA network of Fig. |3] at level h) then it is 
comparatively well connected within itself relative to its surrounding. If 
the cluster is isolated close to the top of the dendrogram, then it is not 
very entangled in the wirings of metabolic pathways, and likely to be a 
reasonably autonomously functioning module. Can we establish objective 
criteria for subnetworks to be regarded as meaningful modules? For exam- 
ple Ref. 10 detects modules in an indirect way using a very weak criterion, 
roughly speaking, that substrates are likely to belong to same module if 
they appear in reactions involving the same set of other substrates. To 
identify groups in social networks Radicci et al? suggested two criteria 
that, adapted to biochemical networks becomes as follows: If, during the 
iterations of the GN algorithm, an isolated vertex set S' c S fulfils the 
following criterion it is said to be a weak community: 



where K in (s) is the number of s € S that are products of a reaction involving 
a substrate s e S, and K out (s) is the number of s £ S \ S' that are products of 
a reaction involving a substrate s £ S. Loosely speaking Eq.|3] means that 
there are, on average, more feedback pathways back into S' than pathways 
leading out to the rest of the network. If the strong condition (Eq. HJ) 




(3) 



and a strong community if: 



Xjn(s) > Xout(s) for all s e S' , 



(4) 
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holds, then products of all reactions involving substrates s 6 S' are more 
likely to belong to S' than not. It turns out that Eq. |U is not fulfilled for 
almost any cluster at any but the lowest level of the hierarchy (closest to 
the bottom of the dendrogram). Eq. |3] is on the other hand fulfilled for 
the largest cluster throughout all iterations of the algorithm. (This picture 
persists for all 43 WIT organisms studied in Ref. 5,) That the subnetworks 
of cellular biochemistry almost completely lacks the community structure 
of social network, or component structure of electronic devices, does not 
necessarily mean that it is futile to talk of biochemical modules. For a 
subnetwork to have some degree of autonomy it has to have some self- 
regulatory function, and thus a feedback loop. To implement this idea, 
consider the subnetworks with substrate vertex set S' that fulfils: 



where L(S') is the number of vertices in S' that lies on an elementary cy- 
cle (a closed non-self-intersecting path) of only vertices in S' and length 
larger than three, \S'\ is the number of vertices in S', and the parameter 
A e [0, 1] is the required fraction of feedback loop vertices. We test the 
three cases where A equals 0, 1/2 and 1, corresponding to the subnetwork 
having at least one feedback loop, more than half of the substrates, or 
every substrate participating in a feedback loop, respectively. The largest 
cluster close to the top of the dendrogram quite naturally fulfils Eqs. |5] 
when A small (in our case or 1/2), therefore we detect subnetworks start- 
ing from the bottom of the dendrogram and go upwards. With each one 
of these criteria we find non-trivial subnetworks. Of the subnetworks of 
Fig- El the hardest requirement, A = 1 detects two relevant subnetworks — 
the one containing CoA and the innermost one containing orthophosphate: 
a-D-ribose 1-phosphate, a-D-ribose 1 -pyrophosphate, adenine, adenosine, 
hypoxanthine, inosine, and orthophosphate. The extended ortophosphate- 
subnetwork still connected at level h (also containing e.g. guanine) is re- 
garded as a valid subnetwork with A = 1/2, but not with A = 1. To assign 
an appropriate A requires a careful look at the problem in question, but as 
a rule of thumb A close to one seems sensible for most applications. 

4 Conclusions 

Finding subnetworks of cellular biochemistry is an important task for mod- 
ern bioinformatics, for both conceptual and analytical purposes. There are 
two general ways to proceed, either one searches for small building blocks 
(cf . Ref. |l2i or one tries to deconstruct the whole network. Our approach 



L(S') < A\S' 



(5) 
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falls into the second category. By adapting an algorithrrP for subnetwork 
detection to biochemical networks we construct hierarchy trees, dendro- 
grams, representing the whole hierarchical organisation of subnetworks 
of biochemical pathways. We find that biochemical networks cannot be 
divided into subnetworks as easily as e.g. acquaintance networks, and 
electronic circuits^ Against this backdrop it is not surprising that some 
recent criteria (Eqs.|3]and|4jl for extracting meaningful social subnetworks 
fail to give non-trivial results. In remedy we propose conditions based on 
the presence of feedback loops within a subnetwork. The above methods 
are illustrated by an application to the metabolic network of T. pallidum, 
we have also tested them on the metabolic and whole-cellular networks 
(containing e.g. transmembrane transport and signal transduction) of 42 
other organisms of the WIT database, 7 and obtain sensible output. 
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